Complexity and Expressiveness for Formal Structures in Natural Language Processing
نویسندگان
چکیده
The formalized and algorithmic study of human language within the field of Natural Language Processing (NLP) has motivated much theoretical work in the related field of formal languages, in particular the subfields of grammar and automata theory. Motivated and informed by NLP, the papers in this thesis explore the connections between expressibility – that is, the ability for a formal system to define complex sets of objects – and algorithmic complexity – that is, the varying amount of effort required to analyse and utilise such systems. Our research studies formal systems working not just on strings, but on more complex structures such as trees and graphs, in particular syntax trees and semantic graphs. The field of mildly context-sensitive languages concerns attempts to find a useful class of formal languages between the context-free and context-sensitive. We study formalisms defining two candidates for this class; tree-adjoining languages and the languages defined by linear context-free rewriting systems. For the former, we specifically investigate the tree languages, and define a subclass and tree automaton with linear parsing complexity. For the latter, we use the framework of parameterized complexity theory to investigate more deeply the related parsing problems, as well as the connections between various formalisms defining the class. The field of semantic modelling aims towards formally and accurately modelling not only the syntax of natural language statements, but also the meaning. In particular, recent work in semantic graphs motivates our study of graph grammars and graph parsing. To the best of our knowledge, the formalism presented in Paper III of this thesis is the first graph grammar where the uniform parsing problem has polynomial parsing complexity, even for input graphs of unbounded node degree.
منابع مشابه
An Applied Linguistics Look at the Linguistic Comparison of Nominal Group Complexity between Two Samples of a Genre
The roles and effects of changes in syntax on comprehension and processing effort, and the relationships between these two, comprise a large and separate field of inquiry, with the general belief now in place that such changes and variations bring about varied psycholinguistic and discursive implications for comprehension, manifesting themselves differently in different genres.The current study...
متن کاملControlled English Ontology-Based Data Access
As it is well-known, querying and managing structured data in natural language is a challenging task due to its ambiguity (syntactic and semantic) and its expressiveness. On the other hand, querying, e.g., a relational database or an ontology-based data access system is a well-defined and unambigous task, namely, the task of evaluating a formal query (e.g., an SQL query) of a limited expressive...
متن کاملThe Question of Expressiveness in the Generation of Referring Expressions
We study the problem of generating referring expressions modulo different notions of expressive power. We define the notion of L-referring expression, for a formal language L equipped with a semantics in terms of relational models. We show that the approach is independent of the particular algorithm used to generate the referring expression by providing examples using the frameworks of (Areces ...
متن کاملLexpresso: A Controlled Natural Language
This paper presents an overview of ‘Lexpresso’, a Controlled Natural Language developed at the Defence Science & Technology Organisation as a bidirectional natural language interface to a high-level information fusion system. The paper describes Lexpresso’s main features including lexical coverage, expressiveness and range of linguistic syntactic and semantic structures. It also touches on its ...
متن کاملInferring Grammars for Mildly Context Sensitive Languages in Polynomial-Time
Natural languages contain regular, context-free, and contextsensitive syntactic constructions, yet none of these classes of formal languages can be identified in the limit from positive examples. Mildly context-sensitive languages are able to represent some context-sensitive constructions, those most common in natural languages, such as multiple agreement, crossed agreement, and duplication. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017